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Assessing statistical significance of periodogram peaks 
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ABSTRACT 

The least-squares (or Lomb-Scargle) periodogram is a powerful tool which is used rou- 
tinely in many branches of astronomy to search for periodicities in observational data. 
The problem of assessing statistical significance of candidate periodicities for different 
periodograms is considered. Based on results in extreme value theory, improved an- 
alytic estimations of false alarm probabilities are given. They include an upper limit 
to the false alarm probability (or a lower limit to the significance) . These estimations 
are tested numerically in order to establish regions of their practical applicability. 
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1 INTRODUCTION 

Analysing astronomical time series, one often has to choose 
between at least two hypotheses, a base one TL and an alter- 
native one IC, based on the existing data array. In the signal 
detection problem, one should check whether the observa- 
tions are consistent with some base model or they contain an 
extra deterministic signal. Under presence of random errors, 
such problem can be solved in a probabilistic sense only. We 
are never protected from mistakes of two kinds. They are the 
false retraction of TL (the 'false alarm') and the false non- 
retraction of TL (the false non-detection). False alarms are 
generally believed to be more dangerous, hence the problem 
of estimation of the false alarm probability (hereafter FAP) 
associated with a candidate signal is very important. Given 
some small critical value FAP, (between 10 -3 and 0.1 usu- 
ally), we could claim that the candidate signal is statistically 
significant (if its FAP < FA P, ) or is n ot (FA P > FAP,). 

For the iLombl l | 19761 ) - IScargld (|l982l ) periodogram 
(hereafter also L— S), the base hypothesis is that the observa- 
tions are pure zero-mean uncorrected and Gaussian errors 
(also called the white Gaussian noise) . The alternative one is 
that a sinuous harmonic is also present. Every single value of 
the L-S periodogram represents a test statistic for the corre- 
sponding problem of hypotheses testing. In routine practical 
cases, however, the period of a possible signal is not known a 
priory and we have to scan many periodogram values within 
a wide frequency range. In this case, the FAP is provided by 
the probability distribution of the maximum periodogram 
value under the base hypothesis (i.e., without signal in the 
data). Existing methods of calculating this distribution for a 
continuous frequency range require time-consuming Monte- 
Carlo simulations. The aim of the present paper is to propose 



analytic approximations which could allow to avoid Monte- 
Carlo simulations (at least in many practical cases). Such 
approximations of the distribution of the maximum have al- 
ready been constructed by mathematicians specializing in 
the field of extreme values of random processes. In the Sec- 
tion [3j these results are adapted for and extended to the 
specific features of the periodogram analysis of astronomi- 
cal time series. In the Section [3J numerical simulations are 
used to explore the quality of the analytic results and to 
show regions of their practical applicability. 



2 GENERAL FORMULATIONS 

Let us recover the principles of the periodogram analysis in 
a somewhat more general formulation than usuallyQ 

Let xi, X2 ■ ■ ■ xn be observations made at N epochs 
ti,ta, . . .tff. The errors of Xi are assumed to be indepen- 
dent and Gaussian with standard deviations Oi. Each value 
of the periodogram can be recovered as a test statistic that 
allows to conclude, how likely is the hypothesis that the 
data contain a signal of a given frequency /. Mathemat- 
ically we should check, whether the observations are fit- 
ted well by some base model having only dn free param- 
eters On, or they require an enlarged model of dtc pa- 
rameters Ok: = {6n,0} with d = d)c — dn parameters 
of an extra periodic signal. We will assume that for any 
fixed frequency both models are linear and construct them 
by means of dn and dtc base functions forming vectors 
<pn(t) and <pic(t,f) ~ {<fn (i), f(t, /)}. Thus the base fit 
model is fin(t,Qn) ~ &n ■ <Pn(t)i the model of the sig- 
nal is n(t,d, f) = 6 ■ <p(t,f) and the complete fit model 
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1 Several mathematical notations, used in the present paper, are 
described in the Appendix lAl 
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is ii K {t,e K ,f) = K ■ <p K {t,f) = fj,n(t,On)+Kt,0,f)- We 
wish to test, whether the hypothesis TL : 6 = should be 
rejected in favour of the alternative fc(f) : 9 7^ 0. 

For the L-S periodogram dn — 0, d — 2, and the signal 
model is given by a harmonic function Q i cos uit + 62 sin urf 
(here w = 2-nf). ISchwarzenberg-Czernvl (fl998allbl'> c onsid - 
ered cases with dn — and arbitrary d. iFerraz-Mellol (|l98ll ) 
put dn = 1 and added a floating co nstant term to t h e har - 
monic model with d = 2, whereas ICumming et~aT1 (Il999l ) 
accounted also for possible linear trend (dn = 2). 

An optimal statistical test , solving such problem in gen- 
eral, is developed rather well (|Lehmanll 19791 . chapter 7). At 
first, one should compute the minima (by 0h,k) of the func- 
tion x 2 = ({% — M>c) 2 ) under hypotheses Ti. and IC(f). This 
may be done by m eans of any accessible linear le ast-squares 
algorithm (see also lSchwarzenberg-Czern v 1998a b). If <jj are 
known precisely, both minima Xn an( i Xic(/) can be com- 
puted and the least-squares periodogram may be defined as 
an advance in x 2 provided by the transition from Ti. to fC(f): 



z(f)=[xn~xUf)] A 



(1) 



The error variances are often not known precisely and we 
have to estimate them from the time series, explicitly or 



implicitly. It is usually assumed that Oi = na n 



where 



the 'measured' uncertainties <r mes ,i determine the weighting 
pattern of the time series, whereas the coefficient k is uncon- 
strained. In this case, only the ratio Xh/Xk can be computed 
exactly, and the periodogram {TJ has to be modified. We will 
consider the following modified periodograms: 



xli - xVj) 

ZXu ' 



z 2 (f) = Nk: 



Xh - xKf) 

2xl(f) ' 



Xn 



Mf) = - ln xUfY 



(2) 



Here, Nn = N — dn and Nk. — N — die are the numbers 
of degrees of freedom in Xn an d Xjo correspondingly. The 
periodograms 21 (/) and 22 (/) are the well-known normaliza- 
tions of z(f) by variances of residuals under the respective 
hypotheses. All periodograms ([2]) are entirely equivalent be- 
cause they are unique functions of each other: 
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-2z 3 /N K 



2z 2 /N K 
(l-2zx/Nn)(l + 2z 2 /N K ) 



(3) 



3 FALSE ALARM PROBABILITY 

Let us pick any of the periodograms introduced above, and 
denote it as Z(f). If the frequency of a possible signal was 
known, the false alarm probability could be retrieved as 
FAP aing i c = 1 - P sing io(^), where P sin gi c (^) is the cumu- 
lative distribution function of Z(f) (taken under the base 
hypothesis). Under the hypothesis H., the statistic 2z follows 
a x 2 -distribution with d degrees of freedom, 2z 2 /d obeys a 
Fisher-Snedecor F-distribution with d and N/c degrees of 
freedom, and 2z\/Nn obeys a beta distribution wi th the 
same numbers of degrees of freedom (|Lehmanlll979l . §7.1). 
Using relations ([3]), the distribution function of 23 can be 
derived easily. The corresponding expressions of false alarm 
probability for d — 2 are given in Table [T] Note that the 
third modified periodogram obeys exactly the same distri- 
bution as the basic one, if d = 2. 



Now let us assume that we scan all frequencies from the 
interval [0, / max ] and look for the maximum value Z m&K = 
max[ ,/ max ] Z(f). Then the false alarm probability, associ- 
ated with this maximum, is FAP max = l™Pmax(Zmax, /max), 

where P m ax(Z ma x, /max) denotes the cumulative distribu- 
tion function of Z ma x (under the base hypothesis). Pre- 
cise expression for the latter distribution is not known 
even for equally spaced time series. It is always possi- 
ble to use Monte-Carlo simulations to obtain this func- 
tion, but this way is very time-consuming, especially for 
the most important region of low false alarm probabili- 
ties (high s ignificances). The function P m a x(-Z, /max) is often 
computed (|Schwarzenberg-Czernv 1998a b) as 



Pmax(^5 /max) ~ -Psinglc(^) 



JV ind (/ ma x) 



(4) 



where iVi n d(/max) is an effective 'number of independent fre- 
quencies' found within [0, /max] - There is no general analytic 
expression for the quantity A r i n d, but it is often suggested to 
use a short Monte-Carlo simulation to assess it and then ex- 
trapo late (0} to low FAP l|Cumminsll2004l ; lHorne fc Baliunasl 
1986). However, the multiple-trial formula Q is only heuris- 
tic and is not necessarily precise even for equally spaced 
observations which don't produce significant aliasing. 

A better estimation of Pm^{Z, /max) may be obtained 
using the theory of stochastic processes. The theory of ex- 
tremes of random processes is developed in mathematical 
literature rather deeply. F or our a i ms, it is worth to men- 
tion the series of works bv lDaviesI (|l977l . Il987l . l2002r i. This 
author considered (in rather general formulations) extreme 
value distributions for x 2 > F, and beta random processes 
that may include our periodograms 2 and 21,2 as special 
cases. The main result of these works is an analytic lower 
limit to the corresponding extreme value distributions. This 
result is potentially very useful for astronomical applica- 
tions, because it yields directly an upper limit to the false 
alarm probability and a lower limit to the significance of 
a candidate periodicity. However, the formulae published in 
the cited papers are not yet ready for usage and require some 
adaptation to specific applications. Moreover, these results 
can be improved to obtain not only an upper limit, but an 
uniform approximation to the false alarm probability, that 
would be good for low spectral leakage at least. 

A brief description of these results, adapted for the un- 
even time series analysis, along with detailes of my exten- 
sions, is given in the Appendix [B] Summarizing them, the 
'Davies bound' may be written down as 



FAP 

max {Z, /max) «S FAP sing l e (Z) + T (Z, f n 



(5) 



The function r will be specified below. If the aliasing effects 
may be neglected within the frequency band being scannecQ, 
and if also /max is large enough, then 

Pmax^, /max) « P s inglc(Z) e ~< Z ^\ ( 6 ) 

The right hand side in ((5| should approach the false alarm 
probability more closely for large Z (even the asymptotic 
equality under Z — > 00 is expected, but not proved strictly 
yet). In general, the quantity r(Z, / max ) looks like 



2 This means that the spectral window of the time series has no 
significant peaks in the doubled frequency band [0, 2/ ma x], except 
for the main one at / = 0. 
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(7) 



for the basic least-squares periodogram |T} and like 
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for the modified periodograms ([2]). Here the coefficient 
F (^2 L ) / r ( iX % ±i )- Note that the asymptotic 
(2/N n x) (d ' 1)/2 l 1 holds true for TV -> oo. The factor 
^4(/max) depends on the bases (p and tpn, on the time series 
sampling and on the weighting pattern. Unfortunately, the 
general form of v4(/ max ), obtained in the Appendix|Bj is not 
simple. For now, let us restrict ourselves to the L-S peri- 
odograms and neglect by aliasing effects. In the next section 
we will show that such approximation for ^4(/ max ) works well 
even for strong aliasing. Of course, anyone is welcome to cal- 
culate j4(/ m ax) numerically from the formulae given in the 
Appendix: such work is still much less computationaly ex- 
pensive than Monte-Carlo simulation of P ma x(Z, /max). The 
practical quality of the expressions (lol(i[) will be also explored 
numerically in the next section. 

To derive j4(/ max ) from the formula (|B7|I . we should 
calculate the eigenvalues of the matrix M, which is defined 
by the group of equalities (|B4|) . To perform this, we have 
to concretize the functions <p(t,f). For the usual L-S peri- 
odogram the harmonic base 



tp(t, /) = {coswt, sinuit} (ui — 2nf) 
produces the matrices 



(9) 



Q 

S 

R 
M 



1 + cos 2ut 
sin 2ut 



sin 2u>t 
1 — cos 2uot 



—tsm2ut t + tcos2ujt 
— i+t cos 2ut t sin 2ujt 



= 2^ 



t 2 - t 2 cos 2ut 



-t 2 sin 2ut 



Q _1 (R 



-t 2 sin 2iot t 2 + t 2 cos 2ut 
-S T Q- 1 S). 



(10) 



If we consider the alias-free case, the terms in (|10[) contain- 
ing sine and cosine functions of frequencies 2/ ^ 2/ max are 
averaged out. Under this approximation M ~ 47r 2 Dt I, where 
Df. = t 2 — t is the weighted variance of the observational 
epochs. Then both eigenvalues required are equal to the con- 



stant 47r 2 Di and A(f n - 



2n 3 / 2 W, where W = / max T c 



a is 

a rescaled frequency bandwidth and T c g — V 4nISt is an ef- 
fective time series length. If ti are spanned uniformly and all 
Oi are equal, then T e g almost coincides with an actual time 
series span. Table [T] contains the alias-free approximations 
of t(Z, / max ) for all L-S periodograms considered. One may 
use these expressions and the ones (|5l6p to write down the 
corresponding alias-free approximation of P max (Z, / max ) and 
its Davies bound. Routinely we deal with rather large values 
of z and W. In this case either the factor P s ingie(Z) in © 
or the term FAP s j ng i e (Z) in ([5]) may be safely neglected. For 
instance, for the usual L-S periodogram 



£ (2,/n 



(1 
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Table 1. False alarm probabilities for the Lomb-Scargle peri- 
odogram and its modifications (d = 2). 



Z(f) FAP single (Z) 



t(Z, /max), approximately 
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^w.K ^ j 1 r (^-^-) ma y be ne - 

glected for N-n ^ 10. If the spectral leakage is low, FAP max 
t(Z, / ma x) for realistic values of parameters (see text). 

Such alias-free approximations are valid if / max is well re- 
solved (W > 1) and if the spectral leakage is low. Only the 
latter assumption is practically significant. If one worries 
about strong spectral leakage, the approximate inequality 



FAPmax(z, /max) £ &~' + We"y/z « We~'y/z, 



(12) 



holds true for the basic L-S periodogram. The rela- 
tions (|11I12|I are equally valid if the base model is not empty, 
but includes a low-order polynomial drift and/or several har- 
monics of fixed frequencies that may be considered as inde- 
pendent on any frequency within the range being scanned. 

For large TV, every modified periodogram 21,2,3 obeys 
approximately the same extreme value distribution as the 
basic one. However, this convergence is not uniform in Z. 
It is easy to derive from (|7I8|I that for the periodograms 
zi,2(f) an extra condition Z <C %/TV must be satisfied to 
keep relative errors of FAP low. This condition is rarely sat- 
isfied in practice. For the periodogram zz, a corresponding 
condition Z <C TV is mild and is often satisfied in practical 
applications. This fact requires to consider the third mod- 
ified periodogram more closely. The log-likelihood function 
of our Gaussian observations is given by 



ln£ = — X j2 — In <Ji + const . 



(13) 



As we adopted <r; = Kcr m es,i, this expression may be rewrit- 
ten as ln£ = — xV(2k 2 ) — TV In ft + const, where x 2 does 
not depend on ft. Maximizing ln£ by ft under the hypothe- 
ses K, and Ti yields that the logarithm of the ratio of the 
corresponding likelihood maxima equals to -#£3. 



4 NUMERICAL SIMULATIONS 

Let us test the analytic results introduced above. For this 
purpose, we will use simulations of time series of TV quasi- 
random data points imitating the white Gaussian noise. The 
temporal moments ti cover a segment of a length T. The 
uncertainties cr; are equal to each other unless otherwise 
stated. For every simulation discussed below, no less than 
10 5 Monte-Carlo trials were generated (ti and at were fixed 
during every such simulation, of course) . This should provide 
accuracies of simulated FAPs about 1% for FAP = 0.1 and 
about 10% for FAP = 10~ 3 . The simulated tale of FAP < 
10 -3 often showed unstable deviations comparable with the 
false alarm probability. 
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max z(f) maximum periodogram value 



Figure 1. Simulated vs. analytic false alarm probability for 
the L-S periodogram with no forced data gapping. Simulations 
for 1000 evenly, 100 and 30 randomly spaced observations, 10 s 
Monte-Carlo trials, / max T = 50,500,5000 (bunches from left 
to right). In the even cases, the simulated curves almost coin- 
cide with their alias-free approximations. Due to exceedance of 
the Nyquist frequency, there is no curve for the even case and 
/ max T = 5000. All graphs of analytic expressions are plotted for 
T c ff = T (this equality holds true within a few per cent for the 
three time series used here). 

If the time series consists of large number of equally 
spaced observations, any aliasing should be negligible. In- 
deed, in such a case the Davies bound (I12|l appears very 
sharp (for FAP < 0.1) and the analytic approximation (|11[) 
perfectly follows the simulated distribution (Figs. [Tl2)) . How- 
ever, even time series don't allow to search frequencies less 
than the Nyquist one /pjy = (N — 1)/(2T). An uneven time 
series allows to access much lower frequencies. However, this 
access cannot be free of any charge. Within a wide fre- 
quency range (/max > /Ny), an essential aliasing is normally 
present purely due to random fluctuations of observational 
moments, even if there is no physical necessity for their gap- 
ping. Thus we may expect that for W > N a significant 'nat- 
ural' aliasing should take place. According to the numerical 
results shown on Fig. [T] the alias-free approximation indeed 
becomes significantly less precise when N decreases, but for 
large FAP only (say, larger than a few per cent). Even for 
W > 100AT the loss of precision remains moderate for prac- 
tically important values of FAP. We can quite use |TT} for 
practical calculations even if W is ten times larger than N 
(or even larger, depending on the desirable precision). 

When a 'physical' spectral leakage is large, the quality 
of the alias-free approximation depends on the frequency 
range too. If / max does not exceed the Nyquist frequency of 
periodic breaking of observations then the interval [0, /max] 
is free from aliasing and we may use (|11[) without signifi- 
cant loss of precision. If the frequency range increases, the 
model (|11|> comes off from the real distribution and some- 
what overestimates the false alarm probability. Such a sim- 
ulation is shown in Fig. [3] In this example, the frequency 
of periodic data breaks corresponds to / max T = 9 and the 
respective Nyquist frequency corresponds to a half of this 
value (/ max T = 4.5). 

Although errors of the alias-free model may become 
practically significant for some extremal situations, they are 
not very large and (more important) not fatal. The signif- 
icance of a candidate periodicity is underestimated, what 
does not favour to false alarms. The aliasing may only de- 



Figure 2. Simulated vs. analytic false alarm probability for the 
modification Z2 of the L— S periodogram: 100 evenly spaced ob- 
servations, 10 5 Monte-Carlo trials, / maJ( T = 50. For comparison, 
the theoretical distribution curves for the periodograms z and z\ 
are also plotted (the curve for 23 almost coincides with that for 
z and is not shown). 
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Figure 3. Simulated vs. analytic false alarm probability for the 
L-S periodogram with forced data gapping. N = 100 observations 
were clumped in ten equal groups, each group spans (randomly) 
only a fifth fraction of its natural duration. About 1.7 - 10 5 Monte- 
Carlo trials, / max T = 5, 50, 500, 5000 (from left to right). 

crease the detectability of low-amplitude signals (if numer- 
ical simulations are not used). In this case, the error of the 
threshold level (i.e., the critical level z*, corresponding to a 
given FAP*) is more important. Examining Fig.|3]yields that 
the relative shift Az,/z, does not exceed 10% for FAP < 0.1. 
As an amplitude of corresponding signal scales as y/z, this 
turns into only 5% relative error of an amplitude threshold. 
Remind, that this 5% offset corresponds to a very strong 
aliasing. Such spectral leakage takes place, for instance, for 
the sequence of observations that are made during 10 days 
with only 4.8 night hours in a day, or during 10 years with 
only 2.4 observational months in a year. 

Note that the multiple-trial formula Q can work well 
in restricted regions only. When constructed from a short 
Monte-Carlo simulation, it can fit well the centre of the dis- 
tribution (i.e., large FAPs), but fails to fit low FAPs. This 
takes place even for negligible aliasing. The spectral leakage 
perturbs strongly the distribution centre but affects weakly 
its high-significance tail. Hence, any multiple-trial models 
constructed from short Monte-Carlo simulations cannot be 
extrapolated to the most important region of low false alarm 
probabilities. Such extrapolation overestimates statistical 
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Figure 4. Top: spectral window function for ELODIE radial 
velocities of 51 Peg. Bottom: Simulated and analytic FAP for 
this time series for 10 s Monte-Carlo trials, P m i n = l// m ax = 
100, 10, 1 days (from left to right). 
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Figure 5. The same as in Fig. [4] but for the star 70 Vir. 



Figure 6. The factor A(/ max ) and its approximations. 



indicating periodic gapping of observations. For the star 
70 Vir, the time series has T e g « 8.5 yrs, T ~ 7.2 yrs and 
posesses a more 'noisy' spectral window (Fig. [5]l indicating 
significant natural aliasing. In the first case, the simulated 
extreme value distributions for the L-S periodogram don't 
show large deviations from alias-free models (relative error 
A(FAP)/FAP < 30% and Az 4 /z» < 5% for FAP < 0.1). In 
the second case, the simulated FAP may be two times less 
than its alias-free approximation, but this still may be tol- 
erated because Az*/z, < 10% (again for FAP < 0.1). Note 
that the both time series possess a strong leakage with one 
day period. Such gapping affects extreme value distributions 
for P m in = l//max ^ 2 days only. In the case of 51 Peg, this 
aliasing could introduce a significant error in FAP for unre- 
alistic frequency ranges (say, for P m i n = 1/fmax < 0.1 days). 
In the case of 70 Vir, the respective deviation is enforced by 
low number of observations, what leads to rather large er- 
rors of FAP already for P m i n = 1 day. Note also that in the 
both cases the errors of alias-free approximations decrease 
significantly when FAP drops to the values 10 -3 + 10~ 2 . 

At last, we need to consider the quality of the alias- 
free approximation for the factor A(f max ). Fig. |S] shows a 
graph of the ratio A(/ max )/(T/ max ) along with graphs of 
its alias-free approximation and upper Carlson bound (see 
Appendix [B} . The observations were spanned in the same 
way as for Fig. [3] The spectral leakage appears only in small 
splashes near the Nyquist frequency of the periodic data 
breaks and near its overtones (i.e., at f ma ,^T = 4.5, 9.0, 13.5). 
For W > 3, the function ^4(/ max ) is well approximated by 
the alias-free model regardless the strong aliasing. 



significances of candidate periodicities, what favours to false 
alarms. 

The last pair of Monte-Carlo simulations in this pa- 
per deals with real astronomical time series. I used epochs 
and standard errors of 153 and 35 radial velocity measure- 
ments of the stars 51 P egasi and 70 Vir ginis, obtained with 
ELODIE spectrograph (|Naef et al.ll2004l n. These time series 
are not even. For the star 51 Peg, the effective time series 
length Toff w 9.0 yrs is close to the actual one T ~ 9.2 yrs, 
but the spectral window (Fig. [4| shows several high peaks 

3 Note that both stars 51 Peg and 70 Vir have a plan etary com- 
panion l lMavor fc Queloj|l99a iMarcv fc Butlerlll99rJ) . 



5 CONCLUSIONS 

The problem of estimating statistical significance of peri- 
odogram peaks is discussed in the paper. The results pub- 
lished in the field of extreme values of random processes are 
adapted for and extended to the periodogram analysis of as- 
tronomical time series. For the Lomb-Scargle periodogram 
and its modifications the corresponding extreme value distri- 
butions are given by a closed formulae being ready for usage. 
If the spectral leakage cannot be neglected, the similar ex- 
pressions provide upper limits to the false alarm probability 
(or lower ones to the significance). 

It is established numerically that the region of validity 
of these approximations is large and has no sharp bound- 
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aries. Even if the aliasing is very strong, the error of the an- 
alytic estimation of false alarm probability does not favour 
false alarms and thus is not fatal. For strong aliases, the 
usage of this analytic approximation slightly decreases the 
sensitivity to low-amplitude signals. However, the respective 
increasing of amplitude thresholds should not exceed several 
per cent in the worst practical cases (like a strong aliasing 
enforced by lack of observations). 

These results may be very useful in a wide variety of 
astronomical applications. They would be useful especially 
for systematic surveys that deal with large amounts of data 
consisting of many separate time series. Indeed, it would be 
very difficult and even impossible to perform Monte-Carlo 
simulation for every of such time series. Vice versa, it is easy 
to use simple analytic formulae (|llll2p or their analogs for 
the modified L-S periodograms. This will eliminate the need 
for Monte-Carlo simulations in the cases when the observed 
periodogram peak exceeds the adopted threshold and in the 
opposite cases when this peak is lower than this threshold 
by more than, say, 10%. The rare intermediate cases are 
easy to be studied by means of Monte-Carlo simulations. It 
is also admissible not to use numerical simulations at all, 
especially for large datasets (N > 100). In this case, the 
number of undetected low-amplitude periodicities may be 
increased by a negligible quantity only. 



ACKNOWLEDGMENTS 

I would thank Drs. V.V. Orlov, K.V. Kholshevnikov, 
L.P. Ossipkov and the anonymous referee for critical read- 
ing of this paper, fruitful suggestions and linguistic correc- 
tions. This work is supported by the Russian Foundation 
for Basic Research (Grants 05-02-17408, 06-02-16795) and 
by the President Grant NS-4929.2006.2 for the state sup- 
port of leading scientific schools. 



REFERENCES 

Azai's J.-M., Wschebor M., 2002, in Sidoravicius V., ed., 
In and Out of Equilibrium: Probability with a Physics 
Flavor. Vol. 51 of Prog. Prob., The distribution of the 
maximum of a Gaussian process: Rice method revisited. 
Birkhauser book, Boston, pp 321-348 

Cumming A., 2004, MNRAS, 354, 1165 

Cumming A., Marcy G. W., Butler R. P., 1999, ApJ, 526, 
890 

Davies R. B., 1977, Biometrika, 64, 247 
Davies R. B., 1987, Biometrika, 74, 33 
Davies R. B., 2002, Biometrika, 89, 484 
Ferraz-Mello S., 1981, AJ, 86, 619 
Home J. H., Baliunas S. L., 1986, ApJ, 302, 757 
Kratz M. F., 2006, Probability Surveys, 3, 230 
Lehman E. L., 1979, Testing Statistical Hypotheses [Rus- 
sian translation]. Nauka, Moscow 
Lomb N. R., 1976, Ap&SS, 39, 447 
Marcy G. W., Butler R. P., 1996, ApJ, 464, L147 
Mayor M., Queloz D, 1995, Nature, 378, 355 
Naef D., Mayor M., Beuzit J. L., Perrier C, Queloz D., 

Sivan J. P., Udry S., 2004, A&A, 414, 351 
Scargle J. D., 1982, ApJ, 263, 835 



Schwarzenberg-Czerny A., 1998a, MNRAS, 301, 831 
Schwarzenberg-Czerny A., 1998b, Baltic Astron., 7, 43 
Tee G, 2005, New Zealand Journ. Math., 34, 165 



APPENDIX A: SEVERAL NOTATIONS 

Let us introduce the following averaging operations: 

JV 

w«)> = 5>(ti)/*?, W) = m))/(i), 

i = l 

with of being the error variance at the observational epoch 
ti. The function (j>(t) may be defined at the set of ti only, 
that is to be a discrete sequence. The quantity {4>i(t)(/>2(t)) 
may be treated as a scalar p roduct in the Hilbert space 
( Sch warzenberg-Czernv|[l998al l . 

All vectors are assumed to be column ones by default. 
The notation {xi,X2, ■ ■ •} corresponds to a column vec- 
tor formed by the quantities inside the braces. Similarly, 
{xi, X2, • • •} is a vector constituted by elements of the vec- 
tors Xl,X2, ■ ■ ■ 

I is the identical matrix. 

* T denotes the transpose of a matrix or a vector. 

If x is a vector then x®x := xx T is a matrix constituted 
by the pairwise products XiXj ■ 

p(xi,X2, . . .) is the joint probability density of the ran- 
dom variables xi,x%,... and p(x\ — a\,X2 = <i2,...) 
is the same joint probability density, taken in the point 
(ai,a 2 , . . .). 



APPENDIX B: RICE METHOD AND 
PERIODOGRAMS 

In the so-called 'Rice method', one considers an integer 
random variable iV + (Zo, /max), the number of up-crossings 
of a given level Zo by the random process Z(f) within 
[0, /max]- The distribution function of the maximum Z max = 
max[ ,/ max ] Z(f) can be represented by the expansion 

Pr{Zmax s= Z } = Pr{Z(0) «S Z } Y, ^f-Vj, (Bl) 

j=o J - 

where uo = 1 and Vj being the conditional factorial mo- 
menta of -/V + (Zo, / m a X ) under condition that Z(0) ^ Zq. 
Let Vj being the unconditional factorial momenta of N + . 
The quantities Vj and Vj are explicitly expressed in terms of 
the so-called 'Rice formulae'. For instance, 

/max OO 

Sl(Z,/m«)= j df J Z'p(Z,Z')dZ', (B2) 



Vj(Z, /max) = J dfi ...dfj J Z[... Zj X 

[0,/ ma xP [0,tx>)3 

x p(Zi = Z, Z[;. . . ; Zj = Z, Zj) dZ[ . . . dZ' h (B3) 

where p(Z,Z') being the joint probability density of Z 
and Z' = dZ/df, both taken at the same frequency /, 
and p(Zi, Z[; . . . ; Zj , Zj) being the joint probability den- 
sity of the pairs Zi — Z(fi),Z' i = Z'(fi). For details on 
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the Rice method and fu rther references see the paper by 
lAzai's fc Wscheborl (|2002l ). 

The expected number of up-crossings plays an impor- 
tant role in what follows. For the sake of convenience, we 
introduce the synonymous notation r = v\. Exact ana- 
lytic expressions of r(Z, /max) for the pe riodograms z(f) and 
zi, 2 (f ) may be derived from results by IPaviesI (|l977l . 1 19871 . 
2002). Actually, Davies dealt with the case when the weights 
of measurements are equal to each other. Nevertheless, his 
results may be directly extended to the unequal weights. 
The quantity r provides not only the upper bound (0 on 
the false alarm probability, but also yields its asymptotic 
representation for large z (low FAP) levels. Unfortunately, 
the asymptotic character of the Davies bound was strictly 
proved only for restricted families of random processes, such 
as stationary Gaussian and stationary \ 2 ones. Nevertheless, 
this asymptotic seems to be non-specific to the distribution 
of the process values and to the strict stationariness (see 
references and di scussion in t he cited works by Davies and 
in the review bv lKratj (|2006h ). Hence, we may expect the 
asymptotic character of (J5]) for all of our periodograms. Note 
that the periodogram 2z(f) may be treated as a \ 2 random 
process, 2zi/d as an F process, and 2z\/Nn as a beta pro- 
cess, according to IPaviesI i|2002l ). 

The high-order Rice formulae are significantly more 
complicated with respect to the first-order one. We will 
not compute here the high-order Rice terms for our pe- 
riodograms in general case. However, the calculations are 
essentially simplified if the long-distance correlations of 
the periodogram may be neglected (equivalently, the alias- 
ing is negligible). Indeed, under the approximation stated 
the density p(Z\, Z[; . . . ; Zj , Zj) may be factorized as 
p(Zi, Z[) . . .p(Zj, Zj) for all frequencies except for the nar- 
row vicinities of the diagonals fi — fj . This property allows 
us to obtain that if / max is large enough to be resolved by 
the periodogram well, the relations Uj ~ Vj ~ t 3 hold true. 
Then the extreme value distribution of Z(f) is given by ©. 
An alternative way to obtain the lat ter expressio n is to as- 
sume a Poisson distribution for (Kratz 2006). 

The factor ^4(/ m ax) in equalities (|7I8|I determines the 
dependence on the frequency range (so-called bandwidth 
penalty). Before considering it, let us denote tp'f — dip/df 
and define the matrices 



Q = tfi ® ip, 
Qn 



<fin 



q = q-q£q h ' h q? 



R — R S^Q-j^S-h, 



(fi®ifi' f , R = ip' f ®ip' f , 

8> (fi, S n = <fin ® tfi' f , 
Qh,h = V>n ® <pn, 
S = S Q-^Q^^Sh, 
M = Q _1 (R S T Q _1 S). 



(B4) 



In general, all these matrices, except for the matrix Qn,n, 
depend on the frequency. Note the relations Sn = and 
S T + S = Q'. The definitions (|B4} look rather bulky, but 
they are essentially simplified under certain conditions. For 
example, if the base functions (fi for any / are orthogonal 
to the functions (fin, then = Sn = and the matrices 
in ()B4[) labelled with a tilde are equal to the same matrices 
without tilde mark. Also if the base (fi is orthonormal for 
any /, then Q = I, S T = -S and M = R + S 2 . The last 
matrix M(/) is necessarily positively definite and possesses 
the positive eigenvalues A^(/). In fact, we need below only 



these eigenvalues. They satisfy the characteristic equation 
det(R-S T Q _1 S-AQ) =0. 

The factor A(/max) appears implicitly in the papers 
by Davies after integration of the mathematical expecta- 
tion E|tj| by /, where the random vector r\ is Gaussian with 
zero mean an d statis t ically independent componets having 
IDrii — Xi(f). IPaviesI (|l987h gave some exact and approxi- 
mate integral formulae for this expectation. I present here 
(in terms of the factor A) a number of new integral repre- 
sentations that may be useful in practice. The first two may 
be derived easily and are given by 

/max 

-A(/max) = j df <j> m(<n) dfl, m 2 (n)—n T Mn, 
o s d 



A(fn 



Jdf J V^^L, 



(B5) 



a; 2 <1 



where dQ denotes an infinitesimal solid angle in R d , directed 
by the unit-length integration vector n. The integration in 
the first formula is performed over all possible directions 
within the whole space solid angle S d = 2iT d/2 /T(d/2). Note 
that the matrix M may be diagonalized by means of a solid- 
body rotation of n, so that m 2 = ^i n i an d the function 
A( fmax) is determined by the eigenvalues Xi only. Inner inte- 
grals in ()B5|I are equal to each other. They may be expressed 
in terms of the (hyper)area of an ellipsoidal (hyper)surface in 

— 1/2 

d dimensions, having semi-axes qi = \ i . Indeed, changing 
the integration variable in the inner integral in the second 
of equations ()B5|I as x = M 1//2 i, then x = in (n 2 = 1) and 
integrating by x we obtain 



i(n)dtt = n d VdetM, IL d = (fi M " t dQ. (B6) 

I (n T Mn)^ 

It can be directly checked that the integrand in the last 
expression represents an infinitesimal (within dfl) area el- 
ement on the surface x T Mx = x 2 n T Mh — 1, and 
equals to its total area. It is not hard to show that Hi = 2. 
The circumference of an ellipse, n2, and the usual surface 
area of an ellipsoid, can be expressed by means of el- 
liptic integrals (complete and incomplete, respectively). For 
d ^ 4 an Abelian integral can be used to compute Hd (|Teei 
2005). There are useful inequalities for n,j, e.g. the Carl- 
son's one bounds the inner integrals in (|B5|1 by the quantity 
S d v/(Ai + ... + A«i)/d = S d VTrM/d. Finally, 



A(fn 



n d ( ? i 



qi ■ ■ ■ Qd 



df < S d 



TrM 

d 



df. (B7) 



The latter inequality seems to be very sharp in practical 
situations (Fig. [6jl . Note also, that if every Ai(/) = A then 

^(/raax) = ^d/max'v/A. 

This paper has been typeset from a T^X/ FTeX file prepared 
by the author. 



